Bacteriophages are the top predators in the bacterial world. They infect bacteria and shape the diversity of the micorbial community. This is comparable to the introduction of top predators such as wolfes and bares in the Yosemite national park.
Phages not only infect bacteria but they can also have a different functions on their host hosts. For example they can effect the course of the bacterial infection. In fact, based on evolutionary reasoning, prophages are postulated to con-tribute genes that increase the fitness of lysogenic bacteria their specific ecological niche.
Phages can have different life cycles. The most well know are the lytic and lysogenic life cycle of phages. in the lysogenic cycle the phage inserts into the host genome and can actively transcribe or excise out of the host again. On the other hand lytic phages do not insert and replicate within the cell without entering the genome.
The two main life cyles of phages. source Wikipedia
Phage evolve very quickly and the diversity is tremendously large, thereby the taxonomy is not very simple. It is hard to find commonalities over all phage genomes. Therefore phages are originally identified based on the phage lytic functions, particle structure and nucleotide structure. On the other hand people try to infer molecular phylogenic trees from the few common genes. The Institute that organises the naming of phages is the International Committee on the Taxonomy of Viruses.
Here, we look at the diversity Phages in the assembled Bifido bacteria from social Bees. We have two main areas we are focusing on: 1. The Prophages within the assembled Bifido bacteria 2. The CRISPRs (bacterial immunsystem) of the Bifido bacteria
We first start of by looking for prophages within the Bifido bacteria. Prophages are phages that have inserted into the bacterial genomes and piggyback for a while. From time to time they replicate and lyse the bacterial cell by bursting.
Before we however get going we started of prepering the different genome files for the tools we are intending to use. To keep it simple I am also reducing the headers of the genome files to a minimum.
Here, we aim to annotate Prophages or temperate phages in the bacterial genome. There are multiple different tool to do this. Among the most common ones are: - Virsorter - Phaster - Phagefinder
Because of the simplicity and the high quality of the tool it is always good to start of with Phaster when searching for Prophages. Phaster is a great tool that searches via homology for known phage genes and tries to identify and illustrate the boundaries of putative prophages. Here we use the online version of Phaster. However we can also submit via wget.
A slightly more complicated and extensive tool for prophage annotation is Virsorter Virsorter however does not have an online version and the installation is rather tedious. Here, I run Virsorter on the all samples. Additionally, I parse the output file to get all the necassary annotation files that are produced by Virsorter.
move the results local
VirFam is an excilent online resource that annotates the morophogenes of phages and makes a first taxonomic classification. In order to submit to Virfam we have to extract First we need to extract all proteins fasta files created by Virsorter in order to submit to Virfam.
##Mapping coverage
In order to see if the putative prophages are active we look at the sequencing coverage. If a phage is active we would expect it to have an different coverage in comparison to the host genome. Therefore we will look what the read coverage is at the site of the prophages. To do this we need to first map the raw reads to the reference genome.
Whenever you want to compare a genome we need to first gather a number of genomes to compare to. It is always advisable to scan through NCBI RefSeq what is currently available. Here, I explore and download the Bifido diversity on NCBI RefSeq. NCBI Refseq are currated and selecte high quality genome assemblies.
Here, I look at the phage diversity of phages.
Additionally, to NCBI it is always advisable to consult species or field specific literature. In our case we know that German has previously looked at Bifido phages his paper Therefore we also download his prepared genome list from github.
Finally, with all these genomes we can try to creat a similiarty network. Therefore we use Vcontact2. Vcontact2 is a very good tool that comparse the presence and other of putative viral genes/clusters. However it is not very inuitive to run. Therefore I precompute the interaction network.
Something to keep in mind is that the interaction network is only as extensive as our data is. Therefore we should consider to scan the internet and include additional Bifido phage genomes into this analysis.
After running Vcontact2 we can make use of the mainly three files:
CRISPRs are the immune system of bacteria.
Here, we look at the CRISPR spacer diversity in order to identifiy with which phages the bacteria have come into contact. A CRISPR-Cas region can be mainly diveded into two regions: 1. CRISPR array (repeats and spacers) 2. Cas-genes
There are different tools to predict CRISPR and Cas regions. Most commonly are the following: - PilerCR - CRISPRcasfinder
Therefore we first predict CRISPR spacers from the genomes